73 research outputs found
Training Knowledge Graph Embedding Models
Knowledge graph embedding (KGE) models have become popular means for making discoveries in knowledge graphs (e.g., RDF graphs) in an efficient and scalable manner. The key to success of these models is their ability to learn low-rank vector representations for knowledge graph entities and relations. Despite the rapid development of KGE models, state-of-the-art approaches have mostly focused on new ways to represent embeddings interaction functions (i.e., scoring functions). In this paper, we argue that the choice of other training components such as the loss function, hyperparameters and negative sampling strategies can also have substantial impact on the model efficiency. This area has been rather neglected by previous works so far and our contribution is towards closing this gap by a thorough analysis of possible choices of training loss functions, hyperparameters and negative sampling techniques. We finally investigate the effects of specific choices on the scalability and accuracy of knowledge graph embedding models.Knowledge graph embedding (KGE) models have become popular means for making discoveries in knowledge graphs (e.g., RDF graphs) in an efficient and scalable manner. The key to success of these models is their ability to learn low-rank vector representations for knowledge graph entities and relations. Despite the rapid development of KGE models, state-of-the-art approaches have mostly focused on new ways to represent embeddings interaction functions (i.e., scoring functions). In this paper, we argue that the choice of other training components such as the loss function, hyperparameters and negative sampling strategies can also have substantial impact on the model efficiency. This area has been rather neglected by previous works so far and our contribution is towards closing this gap by a thorough analysis of possible choices of training loss functions, hyperparameters and negative sampling techniques. We finally investigate the effects of specific choices on the scalability and accuracy of knowledge graph embedding models
Embedding Cardinality Constraints in Neural Link Predictors
Neural link predictors learn distributed representations of entities and
relations in a knowledge graph. They are remarkably powerful in the link
prediction and knowledge base completion tasks, mainly due to the learned
representations that capture important statistical dependencies in the data.
Recent works in the area have focused on either designing new scoring functions
or incorporating extra information into the learning process to improve the
representations. Yet the representations are mostly learned from the observed
links between entities, ignoring commonsense or schema knowledge associated
with the relations in the graph. A fundamental aspect of the topology of
relational data is the cardinality information, which bounds the number of
predictions given for a relation between a minimum and maximum frequency. In
this paper, we propose a new regularisation approach to incorporate relation
cardinality constraints to any existing neural link predictor without affecting
their efficiency or scalability. Our regularisation term aims to impose
boundaries on the number of predictions with high probability, thus,
structuring the embeddings space to respect commonsense cardinality assumptions
resulting in better representations. Experimental results on Freebase, WordNet
and YAGO show that, given suitable prior knowledge, the proposed method
positively impacts the predictive accuracy of downstream link prediction tasks.Comment: 8 pages, accepted at the 34th ACM/SIGAPP Symposium on Applied
Computing (SAC '19
Capacitación, mantenimiento y arreglo de máquinas despulpadoras de Café (Coffea arábica) a 100 caficultores del municipio de Taminango- Nariño.
En Colombia, y por lo general en el departamento de Nariño incluido el municipio de Taminango, existen unos caficultores muy dedicados a las labores de campo, desafortunadamente a lo largo de estos últimos años han descuidado una práctica muy importante que está afectando directamente la calidad y los precios de su producto; como lo es la falta de mantenimiento y el arreglo de su máquina despulpadora de café que es un equipo vital, específicamente en el despulpado ya que si no se realiza adecuadamente se producen defectos como el grano mordido y trillado que afecta notablemente la calidad y precio del café (Coffea arábica).
El presente trabajo se desarrolló con el objetivo principal de capacitar a los caficultores del municipio de Taminango en mantener su despulpadora en excelentes condiciones para así poder desarrollar la actividad de despulpado en adecuadas condiciones, asegurando un grano de buena calidad; para esto se contó con la participación directa de la federación nacional de cafeteros, comité departamental de Nariño y con el respectivo servicio de extensión de este municipio.
Gracias a la vinculación de esta institución, los cafeteros beneficiarios del proyecto y a los estudiantes de la universidad nacional abierta y a distancia, se logró capacitar a 100 caficultores en los temas beneficio húmedo y seco del café (Coffea arábica), identificación y manejo de los defectos del café (Coffea arábica), mantenimiento y reparación de máquinas despulpadoras de café (Coffea arábica). Así mismo se logró reparar 100 máquinas despulpadoras, realizando labores de pintura y latonería, cambio de camisa en material de cobre, cambio de balineras y calibración, asegurando que sus equipos estén en buenas condiciones para procesar su cosecha y obtener un grano de buena calidad.In Colombia, and usually in the department of Nariño including the municipality of Taminango, there are some coffee growers very dedicated to the field work, unfortunately throughout these last years they have neglected a very important practice that is directly affecting the quality and the prices of your product; such as the lack of maintenance and the arrangement of your coffee pulping machine, which is a vital equipment, specifically in pulping since if it is not done properly, defects such as the bitten and threshed grain occur that significantly affect the quality and price of the coffee (Coffea arábica).
The present work was developed with the main objective of training the coffee (Coffea arábica) growers of the municipality of Taminango in keeping their pulper in excellent conditions in order to develop the pulping activity in adequate conditions, ensuring a good quality grain; for this, there was the direct participation of the national coffee federation, departmental committee of Nariño and the respective extension service of this municipality
Thanks to the linkage of this institution, the coffee beneficiaries of the project and the students of the national open and distance university, 100 coffee farmers were trained in the topics of wet and dry coffee (Coffea arábica) benefits, identification and management of coffee (Coffea arábica) defects, maintenance and repair of coffee pulping machines. Likewise, 100 pulping machines were repaired, performing painting and brasswork, changing the shirt in copper material, changing the bearings and calibration, ensuring that their equipment is in good condition to process their harvest and obtain a good quality grain
Accurate prediction of kinase-substrate networks using knowledge graphs
Phosphorylation of specific substrates by protein kinases is a key control mechanism for vital cell-fate decisions and other cellular processes. However, discovering specific kinase-substrate relationships is time-consuming and often rather serendipitous. Computational predictions alleviate these challenges, but the current approaches suffer from limitations like restricted kinome coverage and inaccuracy. They also typically utilise only local features without reflecting broader interaction context. To address these limitations, we have developed an alternative predictive model. It uses statistical relational learning on top of phosphorylation networks interpreted as knowledge graphs, a simple yet robust model for representing networked knowledge. Compared to a representative selection of six existing systems, our model has the highest kinome coverage and produces biologically valid high-confidence predictions not possible with the other tools. Specifically, we have experimentally validated predictions of previously unknown phosphorylations by the LATS1, AKT1, PKA and MST2 kinases in human. Thus, our tool is useful for focusing phosphoproteomic experiments, and facilitates the discovery of new phosphorylation reactions. Our model can be accessed publicly via an easy-to-use web interface (LinkPhinder).Phosphorylation of specific substrates by protein kinases is a key control mechanism for vital cell-fate decisions and other cellular processes. However, discovering specific kinase-substrate relationships is time-consuming and often rather serendipitous. Computational predictions alleviate these challenges, but the current approaches suffer from limitations like restricted kinome coverage and inaccuracy. They also typically utilise only local features without reflecting broader interaction context. To address these limitations, we have developed an alternative predictive model. It uses statistical relational learning on top of phosphorylation networks interpreted as knowledge graphs, a simple yet robust model for representing networked knowledge. Compared to a representative selection of six existing systems, our model has the highest kinome coverage and produces biologically valid high-confidence predictions not possible with the other tools. Specifically, we have experimentally validated predictions of previously unknown phosphorylations by the LATS1, AKT1, PKA and MST2 kinases in human. Thus, our tool is useful for focusing phosphoproteomic experiments, and facilitates the discovery of new phosphorylation reactions. Our model can be accessed publicly via an easy-to-use web interface (LinkPhinder)
Mortality from gastrointestinal congenital anomalies at 264 hospitals in 74 low-income, middle-income, and high-income countries: a multicentre, international, prospective cohort study
Summary
Background Congenital anomalies are the fifth leading cause of mortality in children younger than 5 years globally.
Many gastrointestinal congenital anomalies are fatal without timely access to neonatal surgical care, but few studies
have been done on these conditions in low-income and middle-income countries (LMICs). We compared outcomes of
the seven most common gastrointestinal congenital anomalies in low-income, middle-income, and high-income
countries globally, and identified factors associated with mortality.
Methods We did a multicentre, international prospective cohort study of patients younger than 16 years, presenting to
hospital for the first time with oesophageal atresia, congenital diaphragmatic hernia, intestinal atresia, gastroschisis,
exomphalos, anorectal malformation, and Hirschsprung’s disease. Recruitment was of consecutive patients for a
minimum of 1 month between October, 2018, and April, 2019. We collected data on patient demographics, clinical
status, interventions, and outcomes using the REDCap platform. Patients were followed up for 30 days after primary
intervention, or 30 days after admission if they did not receive an intervention. The primary outcome was all-cause,
in-hospital mortality for all conditions combined and each condition individually, stratified by country income status.
We did a complete case analysis.
Findings We included 3849 patients with 3975 study conditions (560 with oesophageal atresia, 448 with congenital
diaphragmatic hernia, 681 with intestinal atresia, 453 with gastroschisis, 325 with exomphalos, 991 with anorectal
malformation, and 517 with Hirschsprung’s disease) from 264 hospitals (89 in high-income countries, 166 in middleincome
countries, and nine in low-income countries) in 74 countries. Of the 3849 patients, 2231 (58·0%) were male.
Median gestational age at birth was 38 weeks (IQR 36–39) and median bodyweight at presentation was 2·8 kg (2·3–3·3).
Mortality among all patients was 37 (39·8%) of 93 in low-income countries, 583 (20·4%) of 2860 in middle-income
countries, and 50 (5·6%) of 896 in high-income countries (p<0·0001 between all country income groups).
Gastroschisis had the greatest difference in mortality between country income strata (nine [90·0%] of ten in lowincome
countries, 97 [31·9%] of 304 in middle-income countries, and two [1·4%] of 139 in high-income countries;
p≤0·0001 between all country income groups). Factors significantly associated with higher mortality for all patients
combined included country income status (low-income vs high-income countries, risk ratio 2·78 [95% CI 1·88–4·11],
p<0·0001; middle-income vs high-income countries, 2·11 [1·59–2·79], p<0·0001), sepsis at presentation (1·20
[1·04–1·40], p=0·016), higher American Society of Anesthesiologists (ASA) score at primary intervention
(ASA 4–5 vs ASA 1–2, 1·82 [1·40–2·35], p<0·0001; ASA 3 vs ASA 1–2, 1·58, [1·30–1·92], p<0·0001]), surgical safety
checklist not used (1·39 [1·02–1·90], p=0·035), and ventilation or parenteral nutrition unavailable when needed
(ventilation 1·96, [1·41–2·71], p=0·0001; parenteral nutrition 1·35, [1·05–1·74], p=0·018). Administration of
parenteral nutrition (0·61, [0·47–0·79], p=0·0002) and use of a peripherally inserted central catheter (0·65
[0·50–0·86], p=0·0024) or percutaneous central line (0·69 [0·48–1·00], p=0·049) were associated with lower mortality.
Interpretation Unacceptable differences in mortality exist for gastrointestinal congenital anomalies between lowincome,
middle-income, and high-income countries. Improving access to quality neonatal surgical care in LMICs will
be vital to achieve Sustainable Development Goal 3.2 of ending preventable deaths in neonates and children younger
than 5 years by 2030
Reducing the environmental impact of surgery on a global scale: systematic review and co-prioritization with healthcare workers in 132 countries
Abstract
Background
Healthcare cannot achieve net-zero carbon without addressing operating theatres. The aim of this study was to prioritize feasible interventions to reduce the environmental impact of operating theatres.
Methods
This study adopted a four-phase Delphi consensus co-prioritization methodology. In phase 1, a systematic review of published interventions and global consultation of perioperative healthcare professionals were used to longlist interventions. In phase 2, iterative thematic analysis consolidated comparable interventions into a shortlist. In phase 3, the shortlist was co-prioritized based on patient and clinician views on acceptability, feasibility, and safety. In phase 4, ranked lists of interventions were presented by their relevance to high-income countries and low–middle-income countries.
Results
In phase 1, 43 interventions were identified, which had low uptake in practice according to 3042 professionals globally. In phase 2, a shortlist of 15 intervention domains was generated. In phase 3, interventions were deemed acceptable for more than 90 per cent of patients except for reducing general anaesthesia (84 per cent) and re-sterilization of ‘single-use’ consumables (86 per cent). In phase 4, the top three shortlisted interventions for high-income countries were: introducing recycling; reducing use of anaesthetic gases; and appropriate clinical waste processing. In phase 4, the top three shortlisted interventions for low–middle-income countries were: introducing reusable surgical devices; reducing use of consumables; and reducing the use of general anaesthesia.
Conclusion
This is a step toward environmentally sustainable operating environments with actionable interventions applicable to both high– and low–middle–income countries
Knowledge graph mining with latent shape graphs
Knowledge graphs are graph-structured knowledge bases that have shown to be of great value in many Artificial Intelligence applications in academia and industry alike. They are typically generated automatically from un-/semi- structured data sources. The increasing popularity of knowledge graphs has been limited by multiple challenges given the size and quality of the information they contain. This thesis explores the relationship between the quality of knowledge graphs and machine learning technologies used to discover and extract knowledge from them. We focus on quality in terms of completeness and consistency.
Knowledge graphs provide the flexibility required for representing knowledge at different scales in open environments such as the Web. However, their versatility makes them have an ever-changing schema, which also makes them hard to summarize and understand their content. Moreover, they are typically never complete—even in very specific domains—and their consistency with respect to a given schema or ontology cannot be guaranteed without the corresponding validation. That lack of an accurate schema has shown to be problematic in use cases where applications might need to rely on the fact that data satisfy a set of constraints.
The contribution of this thesis is twofold. Firstly, we propose a scalable data-driven method to exhibit the actual (latent) shape of graph data. We introduce an algorithm for mining relation cardinality bounds and building so-called shapes that exhibit important aspects of the structure (or topological information) of entities and relations in a knowledge graph. Latent shapes also allow us to formalise an approximate algorithm for validating the structure of knowledge graphs. Secondly, we exploit the latent shapes of entities and relations to enhance the performance of machine learning models aimed to predict missing links and complete knowledge graphs. We use local patterns information and graph-based feature models in the Bioinformatics domain for improving the prediction of adverse drug reactions achieving new state-of-the-art results. Finally, we extend latent feature models by encoding the cardinality of relations as a regularisation term used to learn semantic embeddings that improve the precision of downstream prediction tasks in benchmark datasets
Using linked data to mine RDF from wikipedia's tables
The tables embedded in Wikipedia articles contain rich, semi-structured encyclopaedic content. However, the cumulative content of these tables cannot be queried against. We thus propose methods to recover the semantics of Wikipedia tables and, in particular, to extract facts from them in the form of RDF triples. Our core method uses an existing Linked Data knowledge-base to find pre-existing relations between entities in Wikipedia tables, suggesting the same relations as holding for other entities in analogous columns on different rows. We find that such an approach extracts RDF triples from Wikipedia's tables at a raw precision of 40%. To improve the raw precision, we define a set of features for extracted triples that are tracked during the extraction phase. Using a manually labelled gold standard, we then test a variety of machine learning methods for classifying correct/incorrect triples. One such method extracts 7.9 million unique and novel RDF triples from over one million Wikipedia tables at an estimated precision of 81.5%.This work was supported in part by Fujitsu (Ireland) Ltd., by the Millennium Nucleus Center for Semantic Web Research under Grant NC120004, and by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289peer-reviewe
On learnability of constraints from RDF data
RDF is structured, dynamic, and schemaless data, which enables a big deal of flexibility for Linked Data to be available in an open environment such as the Web. However, for RDF data, flexibility turns out to be the source of many data quality and knowledge representation issues. Tasks such as assessing data quality in RDF require a different set of techniques and tools compared to other data models. Furthermore, since the use of existing schema, ontology and constraint languages is not mandatory, there is always room for misunderstanding the structure of the data. Neglecting this problem can represent a threat to the widespread use and adoption of RDF and Linked Data. Users should be able to learn the characteristics of RDF data in order to determine its fitness for a given use case, for example. For that purpose, in this doctoral research, we propose the use of constraints to inform users about characteristics that RDF data naturally exhibits, in cases where ontologies (or any other form of explicitly given constraints or schemata) are not present or not expressive enough. We aim to address the problems of defining and discovering classes of constraints to help users in data analysis and assessment of RDF and Linked Data quality.TOMOE project funded by Fujitsu Laboratories Limited and Insight Centre for Data Analytics at NUI Galwa
Knowledge graph mining with latent shape graphs
Knowledge graphs are graph-structured knowledge bases that have shown to be of great value in many Artificial Intelligence applications in academia and industry alike. They are typically generated automatically from un-/semi- structured data sources. The increasing popularity of knowledge graphs has been limited by multiple challenges given the size and quality of the information they contain. This thesis explores the relationship between the quality of knowledge graphs and machine learning technologies used to discover and extract knowledge from them. We focus on quality in terms of completeness and consistency.
Knowledge graphs provide the flexibility required for representing knowledge at different scales in open environments such as the Web. However, their versatility makes them have an ever-changing schema, which also makes them hard to summarize and understand their content. Moreover, they are typically never complete—even in very specific domains—and their consistency with respect to a given schema or ontology cannot be guaranteed without the corresponding validation. That lack of an accurate schema has shown to be problematic in use cases where applications might need to rely on the fact that data satisfy a set of constraints.
The contribution of this thesis is twofold. Firstly, we propose a scalable data-driven method to exhibit the actual (latent) shape of graph data. We introduce an algorithm for mining relation cardinality bounds and building so-called shapes that exhibit important aspects of the structure (or topological information) of entities and relations in a knowledge graph. Latent shapes also allow us to formalise an approximate algorithm for validating the structure of knowledge graphs. Secondly, we exploit the latent shapes of entities and relations to enhance the performance of machine learning models aimed to predict missing links and complete knowledge graphs. We use local patterns information and graph-based feature models in the Bioinformatics domain for improving the prediction of adverse drug reactions achieving new state-of-the-art results. Finally, we extend latent feature models by encoding the cardinality of relations as a regularisation term used to learn semantic embeddings that improve the precision of downstream prediction tasks in benchmark datasets
- …